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Abstract—With the aggressive downscaling of process technolo- 
gies and the importance of battery-powered systems, reducing 
leakage power consumption has become a crucial design chal- 
lenge for IC designers. In addition, the traditional bulk CMOS 
technologies face significant challenges related to short-channel 
effects and process variations. FinFET devices have attracted 
a lot of attention as an alternative to bulk CMOS in sub-32nm 
technology nodes. This paper presents a device-circuit cross-layer 
framework to utilize fine-grained gate-length biased FinFETs for 
circuit leakage power reduction in near- and super-threshold (Vr) 
operation regimes. The impacts of cell-level and transistor-level 
Gate-Length Biasing (GLB) on circuit speed and leakage power 
are studied using a 7nm FinFET technology. 


Index Terms—Gate-Length Biasing (GLB), Leakage Power, 
7nm FinFET Technology, Near-threshold Computing. 


I. INTRODUCTION 


With the severe restrictions placed by cooling and battery 
life constraints today, power efficiency has become the key 
to sustaining a continued performance enhancement in future 
VLSI circuits, since it directly affects the thermal margin, 
circuit performance and reliability [2-6]. To reduce power 
consumption of integrated circuits, Ultra-Low Voltage (ULV) 
CMOS operations, where the supply voltage is scaled down 
to near or below the threshold voltage (Vr) of transistors, 
have emerged as a particularly effective technique for reducing 
circuit power consumption [7], [8]. According to [9], voltage 
scaling from the super-Vr regime down to the near-Vr regime 
yields energy savings on the order of 10X. The ULV operations 
are especially beneficial for performance-relaxed and energy- 
constrained applications such as portable wireless devices, 
implantable medical devices, and sensor network nodes [10]. 
Recent progresses, such as heterogeneous architectures, low- 
voltage SRAM and latch topologies, integrated power delivery, 
packaging and I/O, suggest that the practical realization of the 
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power efficiency benefits of ULV operations is within reach 
[2]. 

Another revolution in the VLSI industry in recent years is 
the introduction of multi-gate or tri-gate transistor structures 
such as FinFETs. It is well known that the steady down-scaling 
of the feature size of bulk CMOS technology has resulted in 
Short-Channel Effects (SCE), such as Drain Induced Barrier 
Lowering (DIBL) and Vyr roll-off [11]. The SCEs limit the 
bulk CMOS transistor scaling in deep-submicron regions [12], 
[13], which inevitably erodes the expected power efficiency 
achieved by applying the ULV operations in CMOS tech- 
nology. The multi-gate or tri-gate transistor structures such 
as FinFETs are proposed to rejuvenate the chip industry by 
rescuing it from the SCEs [13]. The improved electrostatic 
integrity of FinFET devices can alleviate SCEs and further 
lower supply voltages to improve the power efficiency, making 
such devices especially advantageous for near- and super- 
Vr operations [2]. It has been reported that FinFET devices 
are estimated to be up to 37% faster while consuming less 
than half the dynamic power. They are also reported to cut 
the static leakage current by as much as 90% compared to 
the bulk CMOS devices [13]. Besides, the low (or absence 
of) channel doping in FinFETs may eliminate the random 
dopant fluctuation, which is a major source of process-induced 
variations in conventional CMOS technology [2]. Therefore, 
FinFETs are promising device candidates for bulk CMOS at 
the 22-nm technology node and beyond [13]. 

In addition to the above-mentioned SCEs, the down-scaling 
of layout geometries has also resulted in an explosive increase 
in leakage current in recent generations [11]. Leakage has 
become a critical component of the total dissipated power of 
VLSI circuits with its contribution projected to over 50% [14]. 
To overcome the issue of high leakage power in conventional 
CMOS technology, many circuit-level techniques have been 
commonly leveraged such as gate sizing [15], [16], Gate- 
Length Biasing (GLB) [5], [17], sleep mode approach [3], 
stack mode approach [3], multi-Vzg [18] and Dual-Vr [3], 
[17]. 

However, there is a lack of thorough investigation of the 
aforementioned leakage power reduction techniques for the 
deeply-scaled FinFET circuits operating in near- and super- Vr 
voltage regimes. In this paper, we conduct a detailed explo- 
ration by developing a device-circuit cross-layer framework 
described in Fig. 1. The impacts of cell-level and transistor- 
level Gate-Length Biasing (GLB) on circuit speed and leakage 
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power are studied using a 7nm FinFET technology. 


The contributions of this work are threefold. First, we carry 
out a detailed analysis on fine-grained GLB techniques at both 
cell level and transistor level. Circuit simulation results show a 
significant leakage power reduction of about 70% in both near- 
and super-V7 regimes. The total power consumption (com- 
prised of both dynamic and leakage power consumptions) of 
the presented GLB technique can also be significantly reduced. 
Meanwhile, the fine-grained GLB technique also introduces 
27% and 13% penalties in speed, 12% and 10% penalties in 
dynamic energy consumption, and 3% of area overhead in 
these two regimes, respectively. Therefore, the GLB technique 
can be generally applied to reduce leakage power consumption 
for both near- and super-Vrp regimes with relatively minor 
impact on area and dynamic power. Second, we create standard 
cell libraries by using the fine-grained GLB techniques as 
well as a Dual-Vp technique and use them to synthesize 
ISCAS benchmark circuits, in order to compare the GLB 
with the Dual-Vr technique. Synthesis results demonstrate 
that the GLB technique i) is able to deliver a fine-grained 
trade-off curve between leakage power savings and circuit 
speed degradation; and ii) is much more effective compared 
with Dual-Vr technique in the near-Vr regime because its 
tradeoff curve is less sensitive to the supply voltage. Additional 
benefits of using the GLB technique include a less expensive 
fabrication cost as it requires no additional manufacturing 
steps and masks, and an improved immunity against the line- 
edge roughness effect. Finally, we investigate the leakage 
power saving capability of the GLB technique versus the 
granularity of the biased gate length. We also conduct a 
detailed analysis on the costs and benefits of the transistor- 
level GLB technique, in which each transistor can individually 
modify delays of different timing arcs. Experimental results 
show a diminishing return effect that provides insights of 
further optimizing the GLB cell library - a small number of 
cells can be used to achieve the majority of leakage power 
savings. 
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Fig. 1. Flow of our device-circuit cross-layer framework. 


II. RELATED WORK 


Many techniques have come into existence to reduce the 
leakage power consumption for conventional CMOS circuits, 
including Dual-Vr technique and GLB approach. J. Gu et al. 
proposed a Dual-Vr approach that simultaneously incorporates 
gate sizing and mechanical stress [15]. A. Calimera et al. pre- 
sented a temperature-aware Dual-V7 technique that guarantees 
the timing correctness at the boundary temperatures of the 
target technology library [19]. The main idea behind these 
works is to use less leaky high-Vr cells in non-critical paths 
to improve power efficiency while using high speed low-Vr 
cells in critical paths to satisfy timing constraints. 

Since manufacturing high-Vr and low-Vr cells simultane- 
ously requires extra mask layers and additional doping steps, 
thereby making the fabrication process more complex [3], 
[15], P. Gupta et al. applied the GLB technique on non- 
critical paths of digital circuits to reduce the overall leakage 
power consumption [5], [17]. The proposed GLB technique 
in [5], [17] is based on the fact that leakage power decreases 
exponentially and delay increases almost linearly with increas- 
ing gate length, therefore, marginally increasing gate length 
takes advantage of the exponential leakage reduction, while 
only impairing performance in a linear way [17]. Although 
a significant amount of leakage reduction is demonstrated 
to be achieved in CMOS circuits by applying either Dual- 
Vr or GLB technique, none of the works aforementioned 
evaluated these two techniques on the deeply-scaled FinFET 
technology for power-efficient computing in the near- and 
super-Vr regimes. 


HI. 7NM FINFET TECHNOLOGY NODE 
A. FinFET Device Model 


In order to quantify the effects of applying the GLB 
and Dual-V techniques to deeply-scaled FinFET technology 
operating in near- and super-Vr regimes, we introduce our 
7nm deeply-scaled FinFET devices in this subsection, which 
will be used for characterizing libraries of standard cells with 
GLB and/or Dual-V7 techniques. 

Fig. 2 shows the basic structure of a 7nm FinFET device 
with three fins. Each fin provides a channel for conducting 
current when the device is switched on. Each channel is 
wrapped by gate electrodes and hence an enhanced channel 
control is established to help support the SCEs. The design 
parameters of a FinFET device include (i) the fin height A fin, 
(ii) fin width T’r;,,, and (iii) the gate length (or fin length) La. 
The effective channel width W of a single fin is approximately 
twice as large as the fin height h+;,. Connecting more fins 
in parallel increases the total channel width, resulting in a 
stronger driving strength of the FinFET device. Note that the 
width of a FinFET device can only be an integer number 
of times of W (known as the FinFET’s width quantization 
property). Besides the above-mentioned design parameters, 
there are two additional critical geometry parameters: (1) The 
fin pitch, denoted by Pfin, which is defined as the minimum 
center-to-center distance of two adjacent parallel fins; and (11) 
the spacer length Lsp, which is related to the SCEs. Both of 
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TABLE I 
7NM FINFET DEVICE PARAMETERS 
Parameters Value(nm) 
Gate Oxide Material HfO2 + SiO2 
| Gate Oxide Thickness 1.3nm | 
Gate Underlap 1.5nm on each side 
Source/Drain Doping 1x10°cm— 
Nfet Gate Work-Function 4.4eV~4.6eV 
Pfet Gate Work-Function 4.7eV~4.9eV 


the two geometry parameters are dedicated by the underlying 
FinFET technology and fall outside the scope of this paper. 

Due to the lack of publicly accessible industrial data for 
deeply-scaled FinFETs, we derived our FinFET device models 
by using Synopsys Sentaurus Device [20] that is included 
in the TCAD tool suite for simulating device performance 
[21][22][23]. We choose Sentaurus Device because it provides 
advanced physics and the ability to add user-defined models 
for investigation of novel structures such as FinFETs [20]. The 
Sentaurus device simulations apply the hydrodynamic carrier 
transport model, Oldslotboom bandgap narrowing model, and 
the density gradient quantization correction model. The carrier 
mobility degradation resulting from high doping, high field 
saturation, and scattering at silicon-insulator interfaces is also 
taken into account [22]. The device parameters are shown in 
Table I. Authors in [24] have reported the major process- 
related FinFET geometries for 5nm technology and similar 
values can be derived for 7nm technology. For this paper, a 
7nm FinFET process with lambda-based layout design rules 
in Table II is developed. 


B. Leakage Power Saving Techniques 


Generally, there are two leakage power saving techniques 
at 7nm FinFET technology node: apply Gate-Length Biasing 
(GLB) and use Dual-V7 devices. 

Gate-Length Biasing: The original gate length La of our 
FinFET device is 7nm, and in this paper, we consider GLB 
with increased gate lengths up to 9nm. The reason to choose 
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Back Gate 


(b) 


Fig. 2. (a) Perspective view, and (b) top view of the 7nm FinFET device. 


TABLE II 
7NM FINFET-SPECIFIC GEOMETRIES AND DESIGN RULES 
Parameters | Value(nm) Comment 
LFIN 7 Fin length 
Tsr 3.5 Fin width | 
Hrrn 14 Fin height 
Prin 10.5 Fin pitch using spacer lithography 
tox 1.55 Oxide thickness 
Wa 10.5 Minimum contact size 
WM2M 10.5 Minimum space 
We2c 7 Minimum space of gate to contact 
TABLE III 
7NM FINFET DEVICES NAMING CONVERSIONS 
Device Name Vr Gate-Length 
STD 0.235V 7.0nm | 
HVT 0.335V 7.0nm 
GL05 0.235V 7.5nm 
GL10 0.235V 8.0nm | 
GL15 0.235V 8.5nm 
GL20 0.235V 9.0nm 


9nm as the upper bound is that significantly larger gate lengths 
are not layout swappable with nominal versions and this 
can result in substantial Engineering Change Order (ECO) 
overheads during layout [5]. Similar to the GLB technique 
in planar CMOS technology, the small gate-length biases for 
FinFET devices can be achieved by slight modification to the 
layout. 

Dual-Vr Technique: Unlike changing the doping concen- 
tration in the conventional Dual-Vr technique for CMOS 
technologies, we engineer the work-function of gate materials 
to increase the Vr of the FinFET devices. The Vr of the 
standard FinFET devices is 0.235V, and the Vr of the high- 
Vr version is 0.335V. Note that fabricating the FinFET circuits 
with the Dual-Vr technique incurs additional costs in gate 
work-function engineering. 

In summary, we generate standard FinFET devices with 
a 0.235V threshold voltage and a 7nm gate length using 
Synopsys TCAD tool suite. We also generate a set of FinFET 
devices with biased (increased) gate lengths up to 9nm and 
standard Vr value, as well as high-Vr FinFET devices with 
7nm gate length and an increased Vr equal to 0.335V. The 
naming conventions for the generated FinFET devices are 
shown in Table III. We can also name the logic cells made 
up with such FinFET devices, e.g., the cell name for a 1X 
inverter using standard FinFET devices is INVIX_STD. 


C. FinFET Standard Cell Library 


A standard cell library is a set of high-quality timing and 
power models for standard cells such as INV, NAND and NOR 
gates, required by almost all CAD tools for ASIC designs from 
synthesizing the Register Transfer Level (RTL) netlists to gen- 
erating the final data file format, known as Graphic Database 
System (GDS), which represents the geometric shapes and 
other information about layouts in a hierarchical form. There- 
fore, performing analysis and optimization on the deeply- 
scaled FinFET technology node requires properly designed 
and characterized standard cell libraries. The Liberty library 
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format (.lib) has been an open industry standard for almost a 
decade, and it is used by virtually all EDA implementation, 
analysis and library characterization tools as the library model 
for timing, noise, power and test behavior [25]. Therefore, the 
deeply-scaled FinFET standard cell libraries in this paper are 
built in .lib format. 

The 7nm FinFET device models aforementioned are spec- 
ified by lookup tables (LUTs) which can be simulated in 
HSPICE through a Verilog-A interface. A hierarchical manner 
is adopted to build the standard cell libraries: (1) in the Library- 
level: the information of process, supply voltage level, units, 
LUTs of the FinFET device model, thresholds for timing 
parameters as well as operating corners are provided; (i1) in the 
Cell-level: the cell name, area, leakage power, I/O, and various 
capacitances are specified and measured; and (iii) in the Pin- 
level: the timing parameters including rise/fall output slews, 
as well as rise/fall propagation delays, and internal rise/fall 
power parameters are stored in a certain number of 2-D LUTs. 
The timing and power parameters of each logic cell in the 
7nm FinFET standard library are obtained through HSPICE 
simulations with a variety of input stimuli conditions based 
on the Verilog-A based 7nm FinFET device model. 

The numbers of N-type fins and P-type fins, denoted by Fy 
and Fp, respectively, determines the rise and fall delays of a 
FinFET cell. Thus, the sizing of a FinFET standard cell is to 
determine Fy and F’p so as to achieve approximately balanced 
rise and fall delays. We follow the transregional FinFET model 
as well as the sizing method described in [26]. Based on the 
HSPICE simulation results in near-Vr (Vag = 0.30V) and 
super-Vr regions (Vaa = 0.45V), Fy and Fp of each standard 
cell are rounded up to the nearest integer number to achieve 
equal delays in pull-down and pull-up networks. For example, 
the Fp/Fy ratios of 1X, 2X, 4X and 8X inverters are 1/1, 
2/2, 4/4, and 8/7, respectively. 

To evaluate the performance of the presented GLB tech- 
nique, we designed a few different standard cell libraries 
as listed in Table IV. The GLB libraries (NT_GLB and 
ST_GLB), in which each type of frequently used logic cells 
have five versions: one with nominal gate length and four 
with different biased gate lengths. These are used to test 
the effectiveness of (fine-grained) GLB in leakage power 
minimization on various circuits. In this paper, we start with 
a cell-level GLB technique, in which all transistors in a logic 
cell have the same granularity of biased gate lengths. We 
further optimize the standard cell library using the transistor- 
level GLB technique, in which each transistor can individually 
modify delays of different timing arcs. Dual-V;7 libraries 
(NT_DVT and ST_DVT), which consist of standard cells with 
two different threshold voltages - nominal Vr and high Vr, are 
used to test the effectiveness of Dual-Vr technique on circuit 
benchmarks. Standard libraries (NT_STD and ST_STD) are 
reference libraries where no leakage power saving technique 
is applied [27][28]. Note that the STD libraries and GLB-based 
libraries are comprised of logic cells with nominal Vy, and all 
cell libraries have two versions as we characterize them in 
near-Vr (Vaa = 0.30V) and super-Vr (Vag = 0.45V) voltage 
regimes separately. We also reduce the number of gate length 
bias values to form GLBra, GLBrb and GLBrc libraries, which 


TABLE IV 
NAMING CONVENTIONS FOR LIBRARIES USING DIFFERENT TECHNIQUES 
OPERATING IN DIFFERENT REGIMES 


Library Name | Operation Regimes Device Types 
NT_STD 
NT_DVT 
NIGER Near- Vr 
NT_GLBra Vog = NO? 
NT_GLBrb STD, GL10, GL20 
NT_GLBre STD, GL20 
ST_STD 
ST_DVT 
Serr Super- Vr 
(Vaa = 0.45V) STD, GL05, GL15, 
ST_GLBra GL20 
ST_GLBrb STD, GL10, GL20 
ST_GLBre STD, GL20 


has four, three and two gate-lengths, respectively, to explore 
the opportunities of optimizing the GLB library and reducing 
the library complexity. 


IV. IMPACTS OF THE FINE-GRAINED GLB TECHNIQUE 


In this section, we focus on the impacts of the fine-grained 
Gate-Length Biasing technique on 7nm FinFET technology 
node. We analyze the impacts of the GLB technique on all 
important performance metrics of the 7nm FinFET circuits, 
including the leakage power consumption, the circuit speed, 
the dynamic energy consumption, as well as the area of 
circuits. 


A. Impact on Leakage Power and Circuit Speed 


GLB is a promising technology that trades off circuit speed 
for leakage power savings by slightly increasing the gate 
length [17], [29]. It is able to deliver a fine-grained trade-off 
between leakage power and speed in deeply-scaled FinFET 
circuits. In this subsection, we investigate the advantages 
and limitations of GLB technique for FinFET logic circuits 
operating in near- and super-V7 regimes. 

As shown in Fig. 3, a 20-stage FO4 FinFET inverter chain 
made up with 2nm gate-length biased NFET and PFET (..e., 
9nm gate length) achieves up to 69% and 68% leakage 
power reductions in near- and super-V7 regimes, respectively, 
compared to the leakage power results at the nominal 7nm 
gate-length. The reduction of leakage power comes at the cost 
of degradation of circuit speed. One can observe in Fig. 3 
that the propagation delay, which is measured at 50%-50%, 
increases by 27% and 13% in the near- and super-V regimes, 
respectively. Therefore, observations in Fig. 3 strongly justify 
our methodology that leverages FinFET devices with slightly 
biased Lg to achieve a significant amount of leakage power 
reduction at the cost of relatively minor performance degra- 
dation. 

Compared to previous work utilizing the GLB technique 
[17], [29], the GLB technique is more effective for deeply- 
scaled FinFET circuits in the sense that more leakage power 
savings are achieved at the same amount degradation of circuit 
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Fig. 3. (a) Evaluating normalized leakage and delay of a 20-stage FO4 inverter 
chain in near-Vp regime (Vag = 0.30V) and (b) super-Vr regime (Vag = 
0.45V) with different biased gate lengths. 


speed. For instance, measurement results for a single NMOS 
transistor at 130nm shows an 18% leakage power reduction 
at the cost of 15% delay degradation after biasing its gate 
length to 150nm. The reason that more leakage power is 
saved is that the deeply-scaled FinFET technology is heavily 
affected by the SCEs, including the Vr roll-off effect and 
DIBL effect. Both the Vr roll-off effect and the DIBL effect 
will result in an increase of Vr when the gate length increases 
(in which the DIBL effect specifies that Vy is a linear function 
of Vps [30], and the linear coefficient depends on the gate 
length [31]). Therefore, the GLB technique that biases the 
gate-length alleviates the impact of the Vr roll-off and DIBL 
effects and in return results in a higher Vr. The higher Vr is 
the major factor responsible for the significant leakage power 
reduction in the deeply-scaled FinFET technology. In contrast, 
SCEs are negligible at the 130nm technology node. Also, the 
GLB technique is more useful at deeply-scaled technology 
nodes because the leakage power consumption plays a more 
important role in the total power consumption. 

Another important observation is that under the same GLB, 
the normalized leakage reduction is very close in near- and 
super-V regions, while the normalized delay penalty in near- 
Vr regime is twice as that of super-Vr regime (i.e., 27% and 
13% normalized delay increase at Lg = 9nm in the near- 
and super-Vr regimes, respectively). There are two factors that 
the biased gate length affects the circuit speed: i) longer gate 
length reduces the driving strength; and ii) Vr roll-off and 
DIBL effects result in a slightly higher Vr. The former factor 
equivalently impacts circuits operating in the near- and super- 
Vr regimes. However, the latter one degrades the circuit speed 
in a polynomial manner (according to the a-power law) in the 
super-Vr regime and more significantly in the near-Vr regime. 
Therefore, longer normalized delays are observed in Fig. 3 (a), 
compared to Fig. 3 (b). 


B. Impact on Dynamic Power and Total Power 


Applying the GLB technique also affects the dynamic power 
consumption. Fig. 4 shows the normalized average dynam- 
ic power consumption measured using HSPICE at different 
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Fig. 4. Impact of different biased gate lengths on the dynamic energy 


consumption of a 20-stage FO4 inverter chain. 


biased gate lengths of a 20-stage FO4 inverter chain. Note 
that measured power results are not very smooth, due to the 
limited precision of LUTs used in the FinFET device model. 
Trend lines are added in Fig. 4 to average out those uncer- 
tainties in measurements. One can observe that the overhead 
of dynamic power consumption is less than 10% when the 
gate length is biased to 9nm in both the near- and super-Vr 
voltage regimes. Compared to the impacts on circuit speed and 
leakage power consumptions shown in Fig. 3, we conclude that 
the GLB technique has a secondary effect on the dynamic 
power consumption. Of note, we capture the variations on 
dynamic energy consumptions as well in the GLB standard 
cell libraries. 

We also investigate the impact of the presented fine-grained 
GLB technique on the total (averaged over time) power 
consumption of benchmark circuits in both regimes and show 
the results in Fig. 5. The total (averaged) power consumption 
is comprised of both dynamic power and leakage power con- 
sumptions. One can see that FinFET circuits synthesized using 
GLB libraries achieve total power savings over all different 
delay constraints in both regimes. In particular, we observe a 
small amount of total power savings when the delay constraint 
is tight because the leakage power consumption plays a less 
important role compared with dynamic power consumption 
in this case. However, for applications with relaxed delay 
constraints, significant total power reductions of up to 52% 
in the near-Vr regime, and up to 31% in the super-Vr 
regime are observed. Therefore, although the presented fine- 
grained GLB technique consumes slight additional dynamic 
power, it still results in significant savings in total power 
consumption without any timing performance penalty. This 
is mainly because the GLB technique is very effective in 
reducing the leakage power consumption, which has become 
very important in deeply-scaled technology node, in multiple 
voltage regimes. 


C. Area Overhead 


The GLB technique also results in an area overhead. To 
investigate the area overhead, we first compare layouts of a 
single cell with the nominal gate length (STD) and the same 
cell with the maximum biased gate length (GL20). We refer 
to the standard cell layout designed in [27]. Fig. 6 shows the 
comparison between a standard 1X NAND gate a 1X NAND 
gate with the maximal gate-length bias. It can be observed 
that the area overhead of the gate-length biased 1X NAND 
gate is approximately 4%. Note that the distance between Vga 
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and Gnd metals remains the same in all GLB cells. The 
investigation of layouts of other cells shows that the area 
overheads for GLB cells are approximately 1-4%, depending 
on the sizing and the degree of gate-length bias. To further 
estimate the area overhead of entire circuit, we synthesize the 
ISCAS benchmark circuits using 1) a library that only applies 
the nominal gate length (NT_STD); and ii) a library that 
contains logic cells with 7nm, 7.5nm, 8.0nm, 8.5nm and 9nm 
gate lengths (NT_GLB). Then we compare netlists generated 
using these two libraries and estimate the total circuit area. 
As an example, the area overhead of c3540 is approximately 
3.31%, which is totally acceptable, given a significant leakage 
power saving of 43% achieved by the GLB technique. 


V. COMPARISONS BETWEEN GATE-LENGTH BIASING AND 
DUAL-Vr TECHNIQUES 


A. Basic Cell Analysis 


We compare the cell speed and leakage power of different 
runtime leakage power saving techniques, namely, the GLB 
technique and the Dual-Vr technique. Fig. 7 (a), (b), and 
(c) compare normalized delay and leakage power results for 
some basic cells such as the 1X inverter, 1X 2-input NAND 
gate, and 1X 2-input NOR gate, respectively, in near- and 
super-V voltage regimes. The STD, GLO5, GL10, GL15, and 
GL20 denote those cells with gate length of 7nm (unbiased), 
7.5nm, 8nm, 8.5nm, and 9nm, respectively. All of them are 
made with FinFET devices having a lower Vr of 0.235V. The 
HVT cell has a higher threshold voltage of 0.335V. Results 
in Fig. 7 show that, although using a high Vr can reduce the 
leakage power significantly, it results in a huge delay penalty. 
In addition, due to the limitation of fabrication technology, it is 
not practical to continuously modify the gate work-functions 
and generate fine-grained threshold voltages. In contrast, the 
GLB technique provides a solution to produce fine-grained 
trade-offs between the leakage power reduction and the circuit 
speed degradation. 

An important observation from Fig. 7 is that the Dual-Vp 
technique and the GLB technique result in distinct impacts on 
circuit speed in different voltage regimes. When operating in 
the super-Vr regime, the Dual-V7 technique achieves more 
than 90% leakage power reduction with a 2X delay penalty. 
However, the delay penalty increases to 6-8X when the supply 
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Fig. 5. Total power consumptions of some ISCAS benchmark circuits 
synthesized using STD and GLB library in the (a) near-Vr regime and (b) 
super-Vr regime. 
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Fig. 6. Layout geometries of 7nm FinFET NAND gates with Lg@=7nm and 
9nm, respectively. 


voltage is reduced to the near-Vr regime. Compared to the 
Dual-V7 technique, the GLB technique is more robust to the 
supply voltage in the sense that the trade-off points are less 
dependent on the supply voltage. This is due to the following 
two reasons: (i) the GLB technique mitigates the DIBL effect 
which reduces the Vr value at Vga = 0.45V compared with 
Vaa = 0.30V, and (ii) the Vr value of the HVT device is 
0.335V, which is higher than the supply voltage Vag = 0.30V 
in the near-Vr regime and makes gate delay exponentially 
dependent on the supply voltage. 

The robustness property makes the GLB technique more 
effective in the low supply voltage regime. For example, 
considering a non-critical path with a delay that is half of the 
critical path delay, when operating in the super-Vr regime, 
most cells along this path can be replaced by high-V7 cells 
to reduce the leakage power consumption. However, if the 
supply voltage drops into the near-Vr regime, this non-critical 
path becomes the actual critical-path and may cause a timing 
violation because delays of all high-Vr cells are increased 
by 6X to 8X. Therefore, the Dual-Vr technique becomes 
not practical for circuits operating in the near-Vr regime or 
multiple voltage regimes. In contrast, the GLB is still effective 
because the relative delay penalties of GLB cells are more 
robust to the change of the supply voltage. The non-critical 
paths in the super-Vr regime are more likely to remain non- 
critical when operating in the near-Vr regime. 


B. ISCAS 85 Benchmark Results 


To compare the results of leakage energy reduction between 
GLB and Dual-V7 techniques under the same delay overhead, 
we synthesize ISCAS 85 benchmark circuits based on the 
generated standard cell libraries by using the Synopsys Design 
Compiler. The leakage power results are reported by the 
Design Compiler. 

We first compare results of leakage power reductions 
achieved by the presented GLB technique and the Dual-Vr 
technique in different voltage regimes. Table V lists leakage 
power consumptions after synthesizing various benchmark 
circuits with NT_STD, NT_DVT, and NT_GLB libraries with 
the same delay constraint (gate length biased devices or 
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Fig. 7. Comparisons of normalized delay and leakage results for different gate-length biased cells and high-Vr cell at near-Vr and super-Vr voltage regimes. 


Cells from left to right are 1X inverter (a), 1X NAND2 (b), and 1X NOR2 (c). 


HVT devices are used in non-critical paths). Leakage power 
reductions are also reported by normalizing them to leakage 
power consumptions based on the reference NT_STD library. 
One can observe that the presented GLB technique is able to 
reduce the leakage power consumption by up to 63.2% without 
any delay penalty, compared to the results when no leakage 
power saving technique is applied. An average leakage power 
reduction of 45.9% is achieved for all benchmark circuits 
tested. In contrast, results in Table V show that the Dual- 
Vr technique only achieves a 7% leakage power reduction on 
average. The presented GLB technique significantly improves 
the leakage power saving capability by 6.55X on average in 
the near-V7 regime, against the Dual-Vr technique. This is 
because of the GLB’s robustness as analyzed in Section V-A. 

Table VI compares leakage power consumptions and leak- 
age power reductions of these two techniques in the super-Vr 
regime. One can observe that Dual-V; technique (slightly) 
outperforms the presented technique in this condition. This is 
because the relative delay penalty of the Dual-Vr technique 
is much smaller in the super-Vr regime and a large number 
of cells in non-critical paths can be replaced by high-Vr 
cells. The results in Table VI agree with our observations 
in Fig. 7 and analysis in Section V-A. However, the leakage 
power saving capability of the presented GLB technique is 
still comparable to that of the Dual-V7 technique. 


VI. OPTIMIZING THE GLB CELL LIBRARY 


The Gate-Length-Biasing technology has shown great 
promise in leakage power reduction with a relatively minor 


TABLE V 
LEAKAGE POWER CONSUMPTION COMPARISON AMONG STD, DVT 
AND GLB LIBRARIES IN NEAR-Vr REGIME WITHOUT DELAY PENALTY 


impact on circuit delay, area and dynamic power. However, 
applying the proposed GLB technique requires additional 
cell library characterization steps and a significantly larger 
library size. For example, the number of cells in our complete 
GLB library (includes cells with four levels of gate-length 
biasing granularity) is four times larger than that of a nominal 
standard cell library. Considering that library size will affect 
the synthesis speed and eventually the speed of the entire 
circuit design flow, a tradeoff should be found between leakage 
power reduction effectiveness and the number of logic cells in 
the GLB library. 


We create a few GLB cell libraries with less number of bi- 
ased cells to investigate the opportunity to further optimize the 
standard cell library. More specifically, we test GLB, GLBra, 
GLBrb, GLBrc, which are comprised of four, three, two, one 
gate-length biased cells and the nominal cell, respectively, 
under the same delay constraint in the near-V7 regime. Fig. 8 
shows leakage power consumptions normalized to results of 
NT_STD cell library of a few ISCAS 85 benchmarks. One 
can observe that the library that has more gate-length biases 
is able to achieve better leakage power performance since it 
delivers a finer-grained trade-off between (cell) leakage and 
delay. For example, synthesis results of c6288 shows that the 
NT_GLBrc, NT_GLBrb, NT_GLBra, and NT_GLB achieves 
66.2%, 71.5%, 73.6%, and 76.0% leakage power reduction 
rates, respectively. Another important observation is that the 
library with 2 gate-lengths significantly improves the leakage 
performance, compared to STD, but adding more gate-length 
biases can only marginally reduce the leakage consumption, 


TABLE VI 
LEAKAGE POWER CONSUMPTION COMPARISON AMONG STD, DVT 
AND GLB LIBRARIES IN SUPER-V7 REGIME WITHOUT DELAY PENALTY 


NT_STD NT_DVT NT_GLB ST_STD ST_DVT ST_GLB 
Circuits | Leakage | Leakage Leakage Leakage Leakage Circuits | Leakage | Leakage Leakage Leakage Leakage 
(nW) (nW) Reduction (nW) Reduction (nW) (nW) Reduction (nW) Reduction 

c432 133.4 131.2 1.7% 49.1 63.2% 

c499 335.8 327.1 2.6% 153.6 54.3% 

c880a 288.1 234.5 18.6% 153.5 46.7% 

c1355 366.9 346.5 5.6% 154.0 58.0% 

c1908 350.9 298.7 14.9% 200.7 42.8% 

c2670 487.3 456.2 6.4% 208.1 57.3% 

c3540 865.3 836.5 3.3% 612.2 29.3% c3540 

c6288 2107.0 1895.7 11.1% 1394.4 33.8% c6288 
average 616.8 565.8 8.0% 365.7 48.2% average 
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synthesized using different gate-length biased libraries in the near- Vr regime. 
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Fig. 9. A potential target for transistor level GLB. 


which shows a diminishing return effect. Therefore, if the cell 
library is being designed with limited resources and efforts, 
two gate-lengths are good enough to achieve a satisfactory 
leakage performance. In practice, the GLB cell library cannot 
be designed with an infinite number of cells. This diminishing 
return effect provides us insights how to design the GLB cell 
library with an acceptable size, while the majority of the power 
saving capability is claimed. 


VII. IMPACT OF TRANSISTOR LEVEL GLB TECHNIQUE 


To further explore the leakage power reduction potential 
of applying Gate-Length-Biasing in 7nm FinFET node, we 
conduct a study on the transistor level GLB technology in 
this section. Since different transistors control different timing 
arcs of a cell, transistor level GLB technology can individually 
modify delays of different timing arcs and thus is more flexible 
in reducing leakage power consumption by applying GLB on 
non-critical paths. Fig. 9 shows an example of how transistor 
level GLB technology can be applied on the 2-input NAND 
gate. Assume A and B are the two input ports of the NAND 
gate. When port A is on a critical path while port B is on 
a non-critical path, the NAND gate will remain unbiased in 
the cell level GLB technology. However, applying GLB on the 
transistor gates connected to B might be able to further reduce 
leakage power while maintaining a positive timing slack. 

In order to have a detailed analysis on the costs and benefits 
of the transistor level GLB technique, we generate another set 
of library which consists transistor level gate-length biased 
logic cells. We start with the basic cell analysis and finally we 
synthesize ISCAS benchmark circuits using different transistor 
level GLB libraries to see the performance comparison. 


A. Transistor Level GLB Design 


Transistor level GLB has the potential to further reduce 
leakage power, but requires a significantly larger library. 
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Fig. 10. Schematic and layout of 1X 2-input NAND gates with (a) GLB 
applied to input port B (b) GLB applied to input port A. 


Therefore, transistor level GLB should be done for only the 
most frequently used cells [5]. In this paper, we consider 
transistor level GLB for 2-input NAND gates and 2-input 
NOR gates. To maintain the layout rule and design flexibility 
presented in [27], we assume for FinFET transistors connected 
to the same input, the P-type FET and N-type FET always 
have the same level of biased gate length. Fig. 10 shows the 
schematic and layout of 1X 2-input NAND gates with GLB 
applied to different input ports. 


B. Basic Cell Analysis 


We compare the leakage power and the timing arcs of the 
standard cells and different gate length biased cells. Fig. 11 
shows the comparison results of normalized leakage power and 
timing arcs for different gate length biased 1X 2-input NAND 
at near- and super-Vr voltage regimes, and Fig. 12 shows the 
results for 1X 2-input NOR. Without loss of generalization, 
we assume for the pull-down network of the 2-input NAND 
gate (or the pull-down network of the 2-input NOR gate), input 
A is located closer to output and input B is closer to GND 
(or Vpp). In Fig. 11 and Fig. 12, STD denotes those cells 
with gate length of 7nm (unbiased), GL20a and GL20b denote 
the cells in which a biased gate length of 9nm is applied to 
input port A and B, respectively (the other input port remains 
unbiased), and GL20 denotes the cells that all the transistors 
have a biased gate length of 9nm. We use the symbol “a rise” 
to determine the normalized average rise delay from input port 
A to output port, and similar meanings apply to “a fall”, “b 
rise” and “b fall”. 

From Fig. 11 and Fig. 12, one can make the following 
observations: (1) GL20a and GL20b cells have a leakage 
power reduction ratio less than that of GL20 cells, because 
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regime. 


the transistor level GLB technique is applied to one of the 
input ports. The leakage power reduction ratios are almost 
the same in both near-Vr regime and super-Vr regime. (ii) 
For each GL20a or GL20b cell, the timing arcs behave very 
differently. Taking 2-input NAND as an example. The rise 
delay of the biased input port increases significantly, and that 
of the unbiased input port remains unaffected. However, for 
fall delay, both “a fall” and “b fall” are affected no matter 
which transistors are gate length biased. The reason is that 
applying GLB to a transistor is like increasing the equivalent 
resistance. For a 2-input NAND cell, transistors in the pull- 
up network are connected in parallel, and applying GLB to 
one transistor will not affect the other. On the other hand, 
transistors in the pull-down network are connected in series. 
When applying GLB to one of the transistors, the equivalent 
resistance of the entire path is increased. Similar explanations 
apply to 2-input NOR as well. 

According to the above observations, we can find that 
transistor level GLB has the potential to further reduce leakage 
power, but it has several limitations. First, transistor level GLB 
is only useful in very special timing arc situations, because 
the application of GLB on one input port will still affect one 
timing arc (fall delay for NAND or rise delay for NOR) of 
the other input port. In addition, transistor level GLB has very 
little use in super-Vr regime. The reason is that compared with 
cell level GLB, transistor level GLB has no obvious advantage 
in terms of delay increase. 


C. ISCAS 85 Benchmark Results 


We also synthesize ISCAS 85 benchmark circuits using 
the cell libraries which include transistor level GLB. Fig. 13 


shows the normalized leakage power consumptions of ISCAS 
benchmark circuits synthesized using different transistor level 
GLB libraries in the near- and super-Vr regimes. In this 
figure, NT_GLB20ta (or NT_GLB20tb) denotes the standard 
cell library which includes GL20a (GL20b) cells in near- 
Vr, and NT_GLB20tab denotes the library with both GL20a 
and GL20b cells. Similar definitions apply to ST_GLB20ta, 
ST_GLB20tb and ST_GLB20tab. Notice that as we are inter- 
ested in the effect of transistor level GLB, we use the leakage 
power result from NT_GLB20rc (or ST_GLB20rc), the library 
which contains standard cells and GL20 cells, as the normal 
leakage power. 


It can be observed that in the near-Vr regime, the leakage 
power reduction effect for transistor level GLB varies in 
different ISCAS benchmark circuits. For small circuits such as 
c499, transistor level GLB can achieve only 2% leakage power 
reduction compared with cell level GLB, while in relatively 
larger circuits, this number can be over 10%. The reason is 
that based on the observations in Section VU-B, transistor 
level GLB is useful in very special timing arc situations. 
Larger circuits have better flexibility to re-select the logic 
cells and make use of leakage power reduction from transistor 
level GLB. In general, transistor-level GLB technique has 
great potential to further reduce the leakage power in near- 
Vr regime. However, in super-Vr regime, the leakage power 
reduction effect for transistor level GLB is very little (less than 
1%). This result is consistent with the conclusion in Section 
VII-B that GLB has limited use in super-Vr regime. 


Another observation from Fig. 13 is that including one kind 
of transistor level GLB cells (either GL20a or GL20b) is good 
enough to exploit the advantage of applying transistor level 
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Fig. 13. Normalized leakage power consumptions of ISCAS benchmark 
circuits synthesized using different transistor level GLB libraries in (a) near- 
Vr regime and (b) super-Vr regime. 


GLB technique in leakage power reduction. 


VIII. CONCLUSION 


A fine-grained GLB technique was presented for leakage 
power reduction of deeply-scaled FinFET circuits operating in 
near- and super-V voltage regimes. The impacts of GLB on 
circuit speed and leakage were studied. Compared to Dual-V 7, 
the proposed GLB technique was shown to be more suitable 
for deeply-scaled FinFET circuits in the near-Vr regime and 
is as good as the Dual-Vr in the super-Vr regime, due to 
its capability of delivering a fine-grained trade-off between 
leakage power and speed. In the circuit synthesis results, the 
presented fine-grained GLB technique achieved up to 70% 
leakage power reduction in both near- and super-V; regimes 
with zero delay degradation and minor area increase. The ben- 
efits and constraints of the transistor-level GLB technique were 
also studied. In this technique, each transistor can individually 
modify delays of different timing arcs. Results showed that up 
to 10% of leakage power reduction can be further achieved 
by applying transistor-level GLB technique in near-threshold 
regime at the expense of increased cell library size. 
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